Maximising TM Performance through Sub-Tree Alignment and SMT
نویسندگان
چکیده
With the steadily increasing demand for highquality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, in particular focusing on the use of high-quality Statistical Machine Translation (SMT) supplementing the established Translation Memory (TM) technology. In this paper, we present a novel modular approach that utilises state-ofthe-art sub-tree alignment and SMT techniques to turn the fuzzy matches from a TM into nearperfect translations. Rather than relegate SMT to a last-resort status where it is only used should the TM system fail to produce the desired output, for us SMT is an integral part of the translation process that we rely on to obtain high-quality results. We show that the presented system consistently produces betterquality output than the TM and performs on par or better than the standalone SMT system.
منابع مشابه
Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment
With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM) technology. In this paper we present a novel modular approach that utilise...
متن کاملDiscriminative Induction of Sub-Tree Alignment using Limited Labeled Data
We employ Maximum Entropy model to conduct sub-tree alignment between bilingual phrasal structure trees. Various lexical and structural knowledge is explored to measure the syntactic similarity across Chinese-English bilingual tree pairs. In the experiment, we evaluate the sub-tree alignment using both gold standard tree bank and the automatically parsed corpus with manually annotated sub-tree ...
متن کاملCapturing Translational Divergences with a Statistical Tree-to-Tree Aligner
Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the...
متن کاملRich Linguistic Features for Translation Memory-Inspired Consistent Translation
We improve translation memory (TM)inspired consistent phrase-based statistical machine translation (PB-SMT) using rich linguistic information including lexical, part-of-speech, dependency, and semantic role features to predict whether a TM-derived sub-segment should constrain PB-SMT translation. Besides better translation consistency, for English-to-Chinese Symantec TMs we report a 1.01 BLEU po...
متن کاملAn Efficient Framework to Extract Parallel Units from Comparable Data
Since the quality of statistical machine translation (SMT) is heavily dependent upon the size and quality of training data, many approaches have been proposed for automatically mining bilingual text from comparable corpora. However, the existing solutions are restricted to extract either bilingual sentences or sub-sentential fragments. Instead, we present an efficient framework to extract both ...
متن کامل